We present a novel deep neural network architecture for representing robotexperiences in an episodic-like memory which facilitates encoding, recalling,and predicting action experiences. Our proposed unsupervised deep episodicmemory model 1) encodes observed actions in a latent vector space and, based onthis latent encoding, 2) infers action categories, 3) reconstructs originalframes, and 4) predicts future frames. We evaluate the proposed model on twodifferent large-scale action datasets. Results show that conceptually similaractions are mapped into the same region of the latent vector space. Resultsshow that conceptual similarity of videos is reflected by the proximity oftheir vector representations in the latent space.Based on this contribution, weintroduce an action matching and retrieval mechanism and evaluate itsperformance and generalization capability on a real humanoid robot in an actionexecution scenario.
展开▼